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Abstract —In this paper, we develop a new framework for sens¬ 
ing and recovering structured signals. In contrast to compressive 
sensing (CS) systems that employ linear measurements, sparse 
representations, and computationally complex convex/greedy al¬ 
gorithms, we Introduce a deep learning framework that supports 
both linear and mildly nonlinear measurements, that learns a 
structured representation from training data, and that efficiently 
computes a signal estimate. In particular, we apply a stacked 
denoising autoencoder (SDA), as an unsupervised feature learner. 
SDA enables us to capture statistical dependencies between the 
different elements of certain signals and improve signal recovery 
performance as compared to the CS approach. 

I. Introduction 

A. Motivation 

An inverse problem that occurs in a number of important 
applications involves recovering a signal x G from a set of 
under-sampled measurements. This problem is formulated as 
recovering x G from y = r(x) where r(.) : ^ R-^ 

could be either a linear or non-linear function while M -C iV. 
Since this problem is ill-posed in general, one is able to recover 
X given y and r(.) only if x has some type of structure such 
that by applying r(.) its dimensionality can be reduced from 
N without losing information. Many configurations for x and 
r(.) have been explored in the literature for this problem; 
however, one of the most useful ones is to have a sparse signal 
X and a linear r(.), i.e., y = r(x) = $x . Compressive 
sensing (CS) m-ii is a field that tries to solve this linear 
inverse problem in case that x has a sparse representation, 
i.e., there exists an iV x basis matrix ’i' = [r/’i |■02l ■ • ■ liAAr] 
such that X = lEs and only K N of the coefficients 
s are nonzero. Therefore, CS is mainly concerned with the 
problem of recovering a AT-sparse signal x G R^ from a set 
of under-sampled linear measurements, i.e., from M N 
measurements acquired via y = #x = #’S's, where y G R^ 
is the measurement vector and # G R^^^ is the measurement 
matrix. 

The measurement vector formulation y = r(x) suggests 
that one should answer the following questions to compres- 
sively acquire a signal: 
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(i) How to recover the signal x from a given measurement 
vector y and operator r(.)? 

(ii) How to design the measurement operator r(.)? 

(iii) If we are concerned with any type of structure. How 
could we find a representation in which the signal x has 
that structure? 

In case of sparse x and linear r(.), CS framework answers 
these three questions in the following way: 

(i) Using methods from convex optimization or greedy al¬ 
gorithms. 

(ii) Using linear random matrices as measurement matrices. 

(iii) Using pre-specified set of transformations or data- 
dependent basis such as wavelets, frames, and dictionar¬ 
ies. 

Although there has been a considerable progress in CS and 
particularly in the answers of aforementioned questions, our 
goal is to go beyond the state-of-the-art results. We approach 
this goal by incorporating a deep learning framework into 
structured signal recovery. 

Deep learning is an emerging field mainly concerned with 
learning multiple levels of representation of data and coming 
up with higher levels of abstraction in it. In this paper we study 
the ability of deep neural networks to recover structured signals 
(in particular images) from their under-sampled random linear 
measurements. In other words, we study the performance of 
deep learning framework in recovering structured signals from 
their under-sampled measurements. 

The motivation for this work is the great success of deep 
architectures in image representation. In particular, Hinton et 
al. showed in ||4l that one can achieve dimensionality reduction 
in high-dimensional data by training a multilayer neural net¬ 
work called autoencoder. We compare the performance of deep 
learning approach with state-of-the-art algorithms for solving 
the CS problem and show that deep architectures can help us 
to outperform their results at least in certain cases. 

In the following three paragraphs, we briefly describe how 
deep learning provides new opportunities to attack questions 
(i), (ii), and (iii) mentioned above. We specifically compare 
these opportunities with the answers to these questions given 
by CS framework. 

The first question is about recovering the original signal 
from measurement vector and matrix. In order to recover a 



sparse signal x G from its corresponding measurement 
vector y € , one needs to seek for the sparsest signal x 

that agrees with the measurement vector y 

X = argmin ||x'||o s.t. y = $x', (1) 

x' 

where ||.||o denotes the fo-norm of a vector and counts the 
number of its nonzero elements. While it has been shown 
that by using only 0{K) measurements this optimization 
can recover a iiT-sparse signal a, solving O is an NP- 
hard problem. Therefore, researchers have replaced 
in O with its convex relaxation fi-norm to convert ([TJ to a 
tractable and stable linear programming problem. This linear 
program can be solved either based on convex optimization 
methods a or iterative greedy algorithms Q-inoi that are 
generally first order methods and as a result are more suitable 
for high-dimensional problems. In this paper, we replace these 
algorithms, i.e., the convex optimization based approaches and 
greedy iterative algorithms converging usually in hundreds of 
iterations with a feed-forward deep neural network. We show 
that as a result of using a feed-forward deep neural network 
we do not need to solve the linear program to recover x and 
hence we can have much faster signal recovery. 

The second question is about designing measurement ma¬ 
trix. Traditional approaches in designing measurement matrix 
$ are based on focusing on desirable properties needed by $ 
to preserve information while doing dimensionality reduction, 
i.e., mapping x € to y G where M ^ N. One 
important property of measurement matrix that guarantees 
successful sparse signal recovery with very high probability 
is restricted isometry property (RIP) HD. While checking 
whether or not a matrix has the RIP is an NP-Complete 
problem, random matrices whose elements are independent and 
identically distributed (i.i.d.) Gaussian or Bernoulli random 
variables, satisfy the RIP with very high probability given 
M = 0{K\og{N/K)). The main drawback of random mea¬ 
surements is that they are not optimally designed according 
to the signal under acquisition. Adaptive methods m-M in 
which each measurement is designed based on the information 
obtained from previous measurements reduce uncertainty. The 
major problem with adaptive sequential measurements in CS 
is time complexity since each new measurement will depend 
on the information obtained from prior measurements. In this 
paper we show how deep neural networks can help us to 
adapt the measurements to the signal being under acquisition 
instead of taking random measurements and hence enhance the 
performance of the overall system. 

Finally, the third question asks about finding a represen¬ 
tation in which the original signal x has a specific structure. 
In CS framework, one is concerned with finding a basis ^ in 
which X has a sparse representation. It is well known that 
could be chosen from a prespecified set of transformations. For 
example, natural images have sparse representation in wavelet 
basis or in the DCT domain HE). The main drawback of 
these prespecified bases is that they are handcrafted and as a 
result restrictive in capturing complex dependencies between 
different elements of a signal. More concretely, the main 
drawback of representing an image in wavelet domain is 
the assumption of independence between wavelet coefficients. 
This point has motivated researchers to develop models for 
capturing statistical dependencies in real-world signals HD, 


US). However, these models are also handcrafted and hence 
do not necessarily capture more complex dependencies within 
a signal. The limitation in representation power of these 
prespecified transformations has lead researchers to seek for 
data-dependent basis, i.e., learning a transformation from a 
set of training examples nil. Deep learning is a framework 
based on automating feature discovery and feature learning 
for many machine learning tasks. Accordingly, in this paper 
we use deep learning framework to automate the process of 
finding a representation for a class of signals being under ac¬ 
quisition. We show how the learned representation outperforms 
prespecified set of transformations. In particular, we focus on 
image data and show how deep neural networks outperform 
wavelet domain by providing a better representation to do 
dimensionality reduction. 

We believe, to the best of our knowledge, that this paper 
is the first one trying to study structured signal recovery from 
a set of under-sampled measurements by using deep learning 
framework. However, there have been several studies of using 
deep learning technique in solving the inverse problems. These 
studies have been focused on image denoising ll20l . removing 
noisy patterns from images ED, and image super-resolution 

ED. 

The rest of this paper is organized as follows: Section |II] 
introduces the network architecture we have used to solve the 
structured signal recovery problem. In Section |III] we discuss 
the probabilistic interpretation for using stacked denoising 
autoencoders (SDA) in solving the structured signal recovery 
problem. Section |IV] contains the simulation results. Finally, 
Section |V] includes the conclusion of the paper. 

H. Stacked Denoising Autoencoders for 
Structured Signal Recovery 

As we mentioned, natural images have sparse representa¬ 
tion in wavelet basis or in the DCT domain. Therefore, they 
could be compressively acquired and reconstructed using CS 
framework. In this section we introduce our deep architecture 
for solving this CS recovery problem. Later in Section |IV] 
we compare the performance of the proposed method in 
this section with other state-of-the-art approaches from CS 
framework. 

We divide our solution into two different scenarios. First, 
we consider fixed linear measurements that is the traditional 
CS measurement paradigm. Second, we introduce a new mea¬ 
surement paradigm, namely nonlinear adaptive compressive 
measurements, inspired by neural networks architecture and 
capability. Later in Section |IV] we will show that incorporating 
nonlinearity in the measurements enhance the overall recovery 
performance. 

A. SDA + Linear Measurement Paradigm 

In linear measurement paradigm, the measurement vec¬ 
tor y is represented as y = #x, i.e., each y^ (1 < 

i < M) is a linear combination of XjS (1 < i < 

N). We consider the typical supervised learning framework 
where our training set Dtrain has I pairs consisting of 
original signals and their corresponding measurements, i.e., 
Strain — {(y(^\x(^)), (y^^^x^^)),..., (y(*\x(*^)}. Based on 
this training set, we would like to learn a nonlinear mapping 


from a measurement vector y to its original signal x. We 
then test the performance of the trained deep architecture 
on our test set latest where it has s pairs consisting of 
original signals and their corresponding measurements, i.e., 

latest = x(l)), (y( 2 ), x( 2 )), . . . , (y(^), x(^))}. 

Among the traditional sparse recovery algorithms, the ones 
that are greedy and iterative perform faster than the ones that 
are based on convex optimization techniques such as linear 
programming. Each iteration in these greedy or iterative algo¬ 
rithms includes a matrix-vector multiplication which has the 
computational cost of 0{MN). This fact was an inspiration 
for us to design the SDA architecture such that its recovery 
speed would be competitive with the existing fast iterative 
algorithms along with being similar to an iterative message 
passing algorithm. Therefore, each layer of the SDA used 
for sparse recovery either has input size of N (the ambient 
dimension of the original signal) and output size of M (the 
dimension of the measurement vector) or vice versa. We use 
a 3-layer SDA where each layer applies a nonlinearity to 
the affine transformation of its input. More formally, the first 
hidden layer receiving measurement vector as its input is 
formulated as 

=r(Wiy-fbi), ( 2 ) 

where Wi € and bi G are the weight matrix 

and bias vector of the first layer, respectively. T(.) is the 
nonlinearity applied element-wise to the affine transform of 
input. We use sigmoid function as the nonlinearity; therefore, 
'T{^) = 1+1-=: ■ Given the weight matrix Wi and the bias 
vector b, the computational cost for calculation of x^^j^ is 
0{MN) according to ^ that is the same as cost of one 
iteration of an iterative algorithm for CS recovery problem. In 
order to keep this computational cost at each layer, the second 
hidden layer and the output layers are formulated as 

X?i 2 ='r(W 2 X,,,-f ba) and x = r(W 3 X,, 2 -f bg). (3) 

In (O W 2 G pj^AfxAT g j^M jjjg Yvejgjjf niatrix 

and bias vector of the second layer, respectively. Similarly, 
W 3 G and b 3 G R^ are the weight matrix and 

bias vector of the output layer, respectively. We denote the 
output of the SDA and its set of parameters by x and 

= {Wi, bi, W 2 , b 2 , W 3 , b 3 }, respectively. Therefore, we 
can define the nonlinear mapping x = A4L(y, fir) and use the 
mean squared error (MSE) as the loss function for the training 
set Dtrain 

1 ^ 

£(f2L) = y^||A^L(y«,f^L)-xW||i (4) 

‘ i=l 

We use backpropagation 12^ algorithm to minimize the loss 
function defined in (|4]i. Eigure [T] shows the SDA structure fed 
by linear measurements of original signal. 

B. SDA + Nonlinear Measurement Paradigm 

The structure of the SDA for nonlinear measurement 
paradigm is almost the same as the one in Section III-AI The 
only difference is that we consider the mapping from original 
signal to its measurement vector as one layer of the SDA. This 
extra layer will let SDA to adapt its structure to the training set 
I^train- Therefore, if we have enough data, e.g. lots of natural 



Eig. 1: Stacked denoising autoencoders (SDA) for recovering a 
sparse signal from its linear measurements. This is equivalent 
to having a 3-layer neural network fed with linear measure¬ 
ments of the original signal and try to reconstruct it. 


images (as in ImageNet dataset), we could be hopeful that the 
measurement matrix is well adapted to the class of signals 
being under acquisition. We denote this extra layer that is the 
first layer of the SDA by 

y = -F(Wix + bi), (5) 

where iF{.) is the nonlinearity we have used in order to 
take measurements from the original signal x. iF{.) can be 
either a sigmoid function used in other layers of the network 
or other types of nonlinearities or even identity function 
such that the measurements would be linear and at the same 
time adapted to the acquired signals just like traditional CS 
framework. We denote the parameter set of this SDA by 
= {Wi,bi, W 2 ,b 2 , W 3 ,b 3 ,W 4 ,b 4 } and its output by 
X = AdNL(x, nNu)- The loss function corresponding to this 
SDA is similar to © with some minor changes 

1 ' 

^(^^nl) = y ^ IIA4 nl(x^®^, ^Inl) — x^®^ II 2 . (6) 

‘ i=l 

Eigure |2] shows the SDA structure for non-linear measurement 
paradigm. The next section describes how these SDA struc¬ 
tures could be related to the CS recovery problem from the 
probabilistic point of view. 

III. Probabilistic Relation Between SDA and 
Compressive Sensing 

In this section we provide a probabilistic interpretation 
that explains the success of SDA in solving the structured 
signal recovery problem. As introduced in Section |II] the deep 
network that we are using to solve this problem is basically 
stacked version of denoising autoencoders. At the first layer, 
this deep network is fed by a training example that is either 
compressed measurements of an image in training set (Eigure 
[T]i or the original image itself (Eigure |2]i. The next layers 
are then fed by the latent representation (or output code) 
of the denoising autoencoder found on their corresponding 
previous layer. We perform an unsupervised pre-training on 
this deep architecture that is justified in ll24l . The authors 
in II 24 I have explained how unsupervised pre-training helps 



















Fig. 2: SDA for recovering a sparse signal from its non-linear 
measurements. This is equivalent to having a 4-layer neural 
network taking non-linear and adaptive measurements from 
the original signal. The non-linear function used for taking 
measurements could be different from non-linearity used in 
other layers. However, it should be analytical tractable in order 
to fit in backpropagation framework. 


the corresponding optimization problem in deep networks by 
initializing the weights in all layers in a region near a good 
local minimum of loss function. 

In the stacked version of denoising autoencoders, the 
unsupervised pre-training phase is done one layer at a time. 
Each layer of this deep network is pre-trained as a denoising 
autoencoder. In other words, it is trained by minimizing the 
error in reconstructing its input (that is the output code of the 
previous layer) from the noisy version of it. As we proceed in 
the pre-training phase, once the first t layers are trained, we 
can compute the corresponding latent representation (or output 
code) of the first f-layers and use it as an input in order to train 
the t -\- 1-th layer. 

An important aspect of this pre-training phase, is the 
connection between the Restricted Boltzmann Machine (RBM) 
1251 and denoising autoencoder. RBM is a generative model 
that can learn probability distribution underlying its input 
data. It has a set of visible units v and a set of hidden 
units h. Since it is a energy-based model l26l . it associate a 
scalar energy i?(v, h) to each configuration of the visible and 
hidden units. The joint probability distribution is defined as 
P(v, h) = where Z is called the partition function 

used for normalizing the probability distribution. Training an 
RBM is equivalent to configuring its energy function such 
that desirable configurations have low energy. As an example, 
the energy function corresponding to Gaussian-Bernoulli RBM 
l27l with real-valued visible units v and binary hidden units 
h is 


£;(v,h|W,b,c) 
_ {vi — bi)^ 

~ ^ 2af 
1—1 ^ 


rtv Tih 


i=i j=i 


o-i 


(7) 

rih 

i=i 


where Wij denote weights connecting visible and hidden units. 
ai is the standard deviation associated with the i-th Gaussian 
visible unit. Finally, bi and Cj denote biases corresponding to 


visible and hidden units. Training an RBM in this case is the 
process of adjusting weights Wij and biases bi and Cj such 
that the probability distribution it represents fits the training 
data as well as possible. 

The authors in have shown the derivation of an energy 
function for autoencoders by interpreting them as dynamical 
systems Il29l . In particular, the authors in Il28l and ll^ have 
shown that the energy function of an autoencoder with sig¬ 
moidal hidden layer and real-valued observations is identical 
to the free energy of corresponding RBM with Gaussian visible 
units and binary hidden units. 

Suppose that we want to train a denoising autoencoder with 
sigmoidal hidden layer to compress signals from a class of 
probability distribution. This training is equivalent to learning 
a set of weights and biases that will result in low energy for 
signals from that probability distribution. In other words, it is 
equivalent to adjusting set of weights and biases such that the 
reconstruction error it has for recovering signals (drawn from 
that probability distribution) from their compressed represen¬ 
tation, is as small as possible. 

As an example, suppose that in our training set X>train = 
..., (y(*\x(^^)}, the original signals 
x^®)s are drawn from a probability distribution V. As derived in 
||2^ and 1 ^ . the energy function of a denoising autoencoder 
with sigmoidal hidden units is 

E{x) = ^ log(l -f exp(kFjx -f bj)) — i||x — c\\l + const, 

3 

( 8 ) 

where Wij denote weights connecting visible and hidden units, 
and bj and Ci are biases for hidden layer and reconstruction 
layer. Training a denoising autoencoder based on the training 
set Entrain in this case is the process of adjusting the weights 
Wij and biases bj and Ci such that the reconstruction error 
for any signal drawn from probability distribution V (and not 
necessarily in Strain) is as small as possible. 

Similarly, suppose that we want to train a denoising au¬ 
toencoder with sigmoidal hidden layer to decompress data 
that is originally (i.e., before compression) coming from the 
probability distribution V. In this case, the autoencoder learns 
a set of weights and biases that will result in low energy 
for compressed signals drawn originally from the probability 
distribution V. In other words, training is equivalent to ad¬ 
justing set of weights and biases such that the reconstruction 
error it has for recovering compressed signals (drawn originally 
from the probability distribution V) from their decompressed 
representation (in hidden layer), is as small as possible. This 
training will end up in retrieving the original signals (from the 
probability distribution V) as decompressed representations in 
hidden layer since they are the origin of compressed data. 

This is fairly similar to the optimization problem in ([T]i. 
In ([T]i we have the measurement vector (compressed data), we 
know the original signal model (fc-sparse), and the goal is to 
retrieve the original signal from the compressed measurements. 
The considerable difference though is the fact that in ([T]i we 
need an optimization algorithm to retrieve the signal from its 
measurements. However, in an autoencoder (or deep networks 
in general) we need to pass the compressed data into a trained 


















feedforward network without any need to solve an optimization 
problem. 

Once we are done with pre-training of all the layers, we 
perform supervised fine-tuning on the weights and biases of the 
pre-trained SDA. More precisely, we take the encoding part of 
each denoising autoencoder, stack them together, and use back- 
propagation algorithm to minimize the MSE on reconstructing 
the images from their compressed measurements. 

IV. Simulation Results 

In this section we study the performance of our proposed 
framework for structured signal recovery and compare them 
with the state-of-the-art results. We first describe the imple¬ 
mentation of the proposed models. After that, we compare 
our method with other CS recovery algorithms. We do this 
comparison based on both the quality of reconstruction (PSNR) 
and speed of recovery. 


A. Implementation 

Autoencoders are very similar to multilayer perceptron 
(MLP) in structure. In other words, all the units in input 
layer of an autoencoder are connected to all the units in 
hidden layer and similarly all the units in hidden layer are 
connected to all the units in output layer. Therefore, as image 
size grows, we have to train a larger network as well. This 
issue poses a huge computational complexity on running the 
Backpropagation algorithm in addition to increasing the chance 
of overfitting. 

As a result and instead, we design a neural network for 
recovering small sub-images (by sub-image we mean a large 
image patch) instead of a large image. However, this will 
not block our way to compressively measure and recover 
large images. We can decompose a large image into several 
non-overlapping or overlapping sub-images and compressively 
measure and recover each of them. In the case of non¬ 
overlapping sub-images, we will basically have a blocky 
reconstruction of the original image by putting reconstructed 
neighbor sub-images beside each other. On the other hand, 
if we are working with overlapping sub-images, we place 
each recovered sub-image at its corresponding location in the 
original image and average on the overlapping sub-images. 
Figure [3 shows a visualization of overlapping sub-images in a 
larger image. 


In this paper, we have trained our deep neural network 
based on sub-images of size 32 x 32. We used natural images 
from the ILSVRC 2014 ImageNet dataset [31] for both training 
and testing the network. For dataset preparation, we extracted 
the central 256 x 256 part of each image, turned it into 
grayscale, and chopped it into 32 x 32 sub-images. During 
the training phase, we did not use overlapping sub-images 
so as to have a diverse training set. However, during the 
test phase, we use overlapping sub-images and averaging 
method as described earlier in this section. We normalized 
each image pixel value such that they would be between 0 
and 1. According to the results in ll^ . we sampled the initial 
weights of our ne ural network from a uniform distribution 
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Fig. 3: Overlapping sub-images of a larger image. 



Fig. 4: Test images for Table U 


As we mentioned earlier, we use denoising autoencoders 
as building blocks of our network. Therefore, we corrupt the 
input of each layer with a Gaussian noise having zero mean 
and standard deviation of 0.2 and let the layer to reconstruct 
its corrupted input as the pre-training phase. We then use 
backpropagation algorithm to fine-tune the weights and biases. 
We implemented our deep neural networks using Theano 
package 1^ and used GPU on Amazon web service (AWS) 
platform. 

B. Comparison with Other Methods 

In this section we compare the performance of structured 
signal recovery using deep neural networks with other recovery 
algorithms. These other algorithms include 


TABFE I: Quality of reconstruction (PSNR in dB) for different 
images in Figure |4] and different algorithms. The under¬ 
sampling ratio (^) is assumed to be constant and equal to 
0.25 in all the cases. 



L-SDA 

NL-SDA 

D-AMP 

0-NL-SDA 

Tiled D-AMP 

TV 

Damselfly 

29.01 

30.32 

45.97 

30.85 

30.51 

27.46 

Birds 

24.93 

26.19 

25.58 

26.62 

24.45 

21.46 

Rabbit 

25.42 

26.80 

26.37 

27.24 

25.04 

19.79 

Turtle 

31.07 

33.79 

34.17 

34.65 

32.08 

27.16 

Dog 

19.76 

21.03 

19.71 

21.55 

18.45 

13.96 

Eagle Ray 

25.00 

26.18 

25.37 

26.57 

24.68 

16.30 

Boat 

29.67 

31.96 

41.75 

33.11 

33.49 

27.63 

Monkey 

28.33 

29.74 

34.00 

30.32 

28.66 

26.47 

Panda 

19.82 

20.68 

19.66 

21.00 

18.61 

18.31 

Snake 

16.42 

17.39 

16.33 

17.72 

15.46 

10.42 




























































































• One of the state-of-the-art methods that is the 
denoising-based approximate message passing (D- 
AMP) iMl- 

• The total variation (TV) minimization which is 
famous for its intriguing properties for image recovery. 

• The parameterless approximate message passing (P- 
AMP) employing sparsity in wavelet domain. 

• the tiled version of D-AMP (Tiled D-AMP). 

By the Tiled D-AMP we mean D-AMP being applied to the 
non-overlapping sub-images similar to what we mentioned 
for the SDA-based methods in Section IIV-AI We introduce 
the Tiled D-AMP for the sake of fairness in here. Since 
due to huge computational complexity we could not train the 
SDAs for recovering large images, we wanted to compare the 
performance of the D-AMP if it is applied in the same way 
that we apply the SDA-based methods, i.e., recovering images 
from blocky reconstruction of smaller sub-images. 

Table U shows the summary of results for recovering 
10 different images in Figure |4] As is clear from Table |T] 
there is not an obvious winner among different methods. In 
some cases, the L-SDA (SDA + Linear Measurements) and 
NL-SDA (SDA + Non-linear Measurements) and O-NL-SDA 
(SDA + Overlapping Non-linear Measurements) have better 
performance in comparison with the D-AMP and min-TV. 

More specifically and according to our simulation results, 
whenever the acquired image has an irregular structure such 
that there are not to many similar patches in the image, then 
the L-SDA, NL-SDA, and O-NL-SDA have better performance 
comparing to the D-AMP and min-TV. 

For example, in Figure |4] the dog and panda images have 
irregular structure and texture. As we can see in Table U for 
these images, all the recovery methods based on deep learning 
have a better performance in comparison with the D-AMP 
which is one of the state-of-the-art CS recovery algorithms. 
On the other hand, the damselfly and boat images have 
very smooth and regular structure and hence lots of similar 
patches. As a result, we see from Table U that the D-AMP is 
outperforming methods based on deep learning since the D- 
AMP is utilizing these similar patches to enhance the image 
reconstruction. 

One more interesting point about the Table J] is that 
in all cases except one, the NL-SDA and O-NL-SDA are 
outperforming the Tiled D-AMP. This fact shows the potential 
of deep learning and the fact that if we were able to train 
huge networks in a reasonable time or coming up with another 
network structure, we might be able to outperform the D-AMP 
in almost all the cases. We leave this problem as an avenue 
for future work. 

Figures|5]|6] and|2]show 3 examples of set of reconstructed 
images. As we can see, in Figure |6] that we do not have a 
regular and smooth structure, the NL-SDA and O-NL-SDA 
are outperforming the D-AMP. In Figure |7] that has both 
regular and irregular textures, we can see that the D-AMP 
is outperforming the L-SDA and NL-SDA; however, the O- 
NL-SDA is outperforming the D-AMP. Finally, in Figure |5] 
that mostly has a smooth and regular texture, we see that the 



Fig. 5; Reconstructed monkey image using different algorithms 
with ^ = 0.4. Clockwise from upper left: SDA-tLinear 
Measurements (PSNR=29.96 dB). SDA-tNonlinear Measure¬ 
ments (PSNR=31.15 dB). D-AMP (PSNR=38.48 dB). TV 
{PSNR=29.67dB). Tiled D-AMP (PSNR=31.56 dB). Overlap¬ 
ping SDA-tNonlinear Measurements (PSNR=32 dB). 


D-AMP is outperforming SDA-based methods although the O- 
NL-SDA is outperforming the Tiled D-AMP. 

Although there is not a clear winner among the different 
methods from the reconstruction quality point of view, our 
simulation results show that the L-SDA and NL-SDA beat the 
other methods from the reconstruction time perspective. This is 
clear both intuitively and mathematically since in all the other 
methods, we need to solve an optimization problem. We solve 
this optimization problem by using either convex optimization 
techniques (e.g. linear programming) or greedy algorithms. 
However, the L-SDA and NL-SDA do not need to solve any 
optimization problem. They just use a feed-forward neural 
network to recover images from their measurements. Table 
Hn shows reconstruction time for different recovery algorithms 
and different under-sampling ratios. As we can see, for under¬ 
sampling ratio of 0.4, the NL-SDA and L-SDA are almost 
1,000,000 times faster than the D-AMP. 

Figure |8] shows the plot of average probability of suc¬ 
cessful recovery for different under-sampling ratios and dif¬ 
ferent recovery algorithms. In order to calculate the prob¬ 
ability of successful recovery we have used 1881 Monte 
Carlo samples. For each under-sampling ratio S and for the 
j-th Monte Carlo sample, we define the success variable 
(psj = I < 0.01^ where is the j-th Monte 

Carlo sample, denotes the corresponding recovered image, 
and I(.) denotes the indicator function. We then define the 
empirical success probability as Ps = we 

can see in Figure |8l for small under-sampling ratios (less 
than 0.06), SDA-bases methods are outperforming the D-AMP. 
In addition, for larger under-sampling ratios the D-AMP is 
outperforming SDA-based methods. Nevertheless, SDA-based 
methods (specially the NL-SDA) are outperforming the Tiled 
D-AMP, P-AMP (employing sparsity in wavelet domain), and 
TV minimization in a large range of under-sampling ratios. 
Of course, comparing SDA-based methods with the Tiled D- 









Fig. 6; Reconstructed dog image using different algorithms 
with ^ = 0.3. Clockwise from upper left; SDA+Linear 
Measurements (PSNR=20.19 dB). SDA+Nonlinear Measure¬ 
ments (PSNR=21.27 dB). D-AMP (PSNR=20.25 dB). TV 
{PSNR=14.68 dB). Tiled D-AMP {PSNR=18.54 dB). Overlap¬ 
ping SDA-rNonlinear Measurements {PSNR=21.88 dB). 



Fig. 7; Reconstructed Food and Fork image using dif¬ 
ferent algorithms with ^ = 0.25. Clockwise from up¬ 
per left: SDA-rLinear Measurements (PSNR=29.45 dB). 
SDA-rNonlinear Measurements (PSNR=31.79 dB). D-AMP 
(PSNR=31.90 dB). TV (PSNR=25.16 dB). Tiled D-AMP 
(PSNR=30.57 dB). Overlapping SDA-rNonlinear Measure¬ 
ments {PSNR=32.46 dB). 


AMP is a fairer comparison rather than comparing them with 
D-AMP alone. 

Finally, Figure |9] denotes the convergence curve of fine 
tuning step in training our deep neural network. It shows the 
average PSNR (in dB) on test images over different iterations 
of backpropagation algorithm. This figure shows that for the 
under-sampling ratio of 0.06, the NL-SDA method has started 
to outperform the D-AMP method after 3.5 x 10"^ running of 
backpropagation algorithm. The jumps in this plot is due to 
feeding the neural network with new training data. 



Fig. 8: Average probability of successful signal recovery for 
different under-sampling ratios and different algorithms. In 
order to calculate the probability of successful recovery we 
have used 1881 Monte Carlo samples. If we denote the original 
signal by x and the recovered signal by x, then we call a 
recovery successful if < 0.01. 


TABLE II; Running time (in sec.) for recovering the dog image 
for different under-sampling ratios and different algorithms. 
Numbers with bold face show the winner in each row. 


7 ^ — 
N 

L-SDA 

NL-SDA 

D-AMP 

0-NL-SDA 

Tiled D-AMP 

TV 

0.06 

0.002 

0.002 

74.79 

1.01 

43.99 

45.94 

0.1 

0.002 

0.002 

92.21 

1.03 

50.00 

39.82 

0.25 

0.002 

0.002 

108.61 

1.07 

50.84 

43.33 

0.4 

0.002 

0.002 

1900.68 

1.22 

39.55 

38.83 


V. Conclusion and Future Work 

In this work, we developed a new framework for sensing 
and recovering structured signals. This framework is able to 
learn a structured representation from training data, support 
both linear and mildly nonlinear measurements, and efficiently 
computes a signal estimate. In particular, we used a stacked 
version of denoising autoencoders, as an unsupervised feature 
learner. We showed how SDA enables us to capture statistical 
dependencies between the different elements of certain signals 
and improve signal recovery performance as compared to the 
CS approach. 

We should note that GRBMs treat different components of 
the input image vector as conditionally independent given the 
hidden layer state. This is an important limitation in modeling 
natural images using GRBM (and denoising autoencoders 
correspondingly). One important direction for future work is 
to come up with a model that can easily be extended to large 
images. In addition, it should capture relationships between 
pixel intensities rather than assuming them independent condi¬ 
tioned on the hidden layer. This model will let us to outperform 
the D-AMP algorithm in the cases that SDA-based methods 
introduced in this paper were not able to. 
























Fig. 9: Convergence of Backpropagation over different itera¬ 
tions and comparing the test result with other methods. In this 
plot average PSNR is calculated over 1881 test images each 
acquired with under-sampling ratio of 0.6, i.e., ^ = 0.06. 
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